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igher education is in an age of accountability, with accrediting agencies, government 
leaders, taxpayers, parents, and students demanding to see outcomes for expended 
resources (Hoffman & Bresciani, 2010). The costs of higher education have been 
increasing well beyond the rate of inflation, student loan debt is higher than ever, and 
students, families, employers, and policymakers are all expressing some concern that a 
college education does not necessarily result in appropriate employment after graduation 
(Blimling, 2013). For student affairs professionals, this age of accountability translates 
into an age of assessment: We must demonstrate that the work we do is producing the 


desired results. 


Assessment in student affairs is “the process of 
collecting and analyzing information to improve 
the conditions of student life, student learning, 
or the quality and efficiency of services and pro- 
grams provided for students” (Blimling, 2013, 
p. 5). Educators were calling for improving 
student services through evaluation and research 
as far back as 1937 in the American Council on 
Education’s Student Personnel Point of View, and 
student affairs professionals have always engaged 
in various kinds of research and assessment. 
Assessment is more important today than ever— 
it plays a central role in the reform movement in 
higher education that began in the 1980s (Blim- 
ling, 2013). But student affairs professionals may 
not possess the necessary skills to conduct assess- 
ment effectively (Blimling, 2013; Hoffman & 
Bresciani, 2010). 

When student affairs professionals assess their 
work, they often employ some type of survey. The 
use of surveys stems from a desire to objectively 
measure outcomes, a demand from someone else 
(e.g., supervisor, accreditation committee) for 
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data, or the feeling that numbers can provide an 
aura of competence. Although surveys are effec- 
tive tools for gathering information, many people 
don’t know how to create a survey that accurately 
measures what they want to know. Professionals 
may administer surveys that ask vague questions 
or that otherwise fail to capture needed knowl- 
edge. And once data are collected, researchers 
might not know what to do with the informa- 
tion besides compiling percentages of response 
choices. Better survey instruments will enable 
student affairs professionals to use the outcomes 
to drive decision making. 

This brief offers five suggestions for avoiding 
common mistakes in survey design and use, and 
for facilitating the development of high-quality 
surveys that can be used to gather data for evi- 
dence-based decisions. Senior administrators and 
their staffs can use the brief as an introductory 
guide and a checklist of the fundamentals of 
survey development, implementation, and data 
interpretation. 


Rather than merely proving that the work of 
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student affairs matters, a well-done assessment 
should focus on improving such work. Research 
and assessment moves the field forward by supply- 
ing information about what works for students, 
what does not work, and the causes of successes 
and failures. Just as medical doctors study and 
share the factors that promote health, student 


affairs practitioners have an obligation to empiri- 
cally demonstrate effective methods of promoting 
student learning. If surveys are a primary tool 
for gaining this kind of knowledge, they must be 
done correctly. This brief is a starting point for 
better surveys and more robust analysis of the 
data resulting from them. 


Figure 1. Glossary of Common Terms in Survey Design and Implementation 


Classical 
measurement 
model 

Conclusion validity 
Construct validity 


Content validity 


Criterion-related 
validity 


Cronbach’s 
coefficient alpha 


Error 


Latent variable 


Likert response format 


Measurement 
Psychometrics 
Reliability 


Scale 


True score 


Validity 


FIVE THINGS 


04 


a common model for developing scales that analyzes how responses to multiple items correlate with 
each other as a way of knowing how well the items measure the true score of the latent variable; 
also referred to as Classical Test Theory. 

the extent to which conclusions based on the data are logical and appropriate. 

the extent to which a scale behaves according to predictions. 


the extent to which a scale measures what it is supposed to measure. 


providing validity evidence by comparing a scale to an external criterion (e.g., another scale) with 
proven validity. 


a common statistical method for measuring reliability by analyzing how responses to each item in a 
scale relate to each other. 


represents lack of accuracy in a scale by taking the difference between a perfectly accurate scale (1) 
and a scale’s reliability (i.e., Cronbach’s alpha), which ranges from 0 to 1. 


an underlying psychological construct that a scale attempts fo measure. 


a common format for measurement that presents respondents with statements and asks to what 
degree they agree or disagree with the statements. 


the assignment of numbers to represent different degrees of a quality or property. 
a subspecialty directly concerned with the measurement of psychological phenomena. 
the extent to which a scale performs in consistent, predictable ways. 


a measurement instrument composed of a collection of items that combine into a total score to reveal 
the level of a latent variable. 


the actual (not measurable) level of a latent variable. 


the extent to which evidence and theory demonstrate that the correct latent variable is measured and 
the resulting conclusions are appropriate. 
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] Don’t Lose Sight of What 
You Want to Know 


Every survey starts with a desire to know something. 
For example, we work in a residence hall and we want 
our students to feel a sense of community and con- 
nection. At the end of the fall semester, we measure 
their sense of belonging in their residence hall. This 
knowledge will help us make programmatic and 
policy decisions for the spring semester. 

Sense of belonging is the latent variable we want 
to measure. Latent means underlying or hidden. Of 
course, we cannot take a tape measure, walk up to 
students, and ask them if we can measure their sense 
of belonging. Measuring something psychological in 
nature is quite different from measuring something 
physical. The process is messier, a condition that 
confronts student affairs professionals often in their 
assessment efforts. 

The students’ actual level of belonging is called 
the true score. We can never know the true score 
because we cannot enter into the minds of other 
people and measure the level of anything. But there is 
hope. Although we cannot directly measure sense of 
belonging, we can measure it indirectly and approx- 
imate the true score. To do this, we use the classical 
measurement model, also known as the classical test 
theory. The classical measurement model states that 
if we attempt to measure a latent variable multiple 
times, we can approximate how close we get to the 
true score. 


? Don’t Just Ask, Measure 


Survey is a general term referring to any type of 
questionnaire that gathers information from a sample 
group of individuals. Often, what we really mean 
when we use the term survey is a scale. The technical 


difference is that a survey merely collects information, 
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while a scale attempts to measure something by asking 
respondents to assign numbers that represent various 
degrees of a specific quality or property. 

In the 1670s, Sir Isaac Newton attempted to 
measure things by making multiple observations and 
averaging the results for a simple representation of all 
the data. More than a century later, Charles Darwin 
observed and measured variations across species, which 
led to the development of formal statistical methods. 


Research has shown that college students 
rarely accurately report on their behaviors, 
especially frequent, mundane activities. 


Sir Francis Galton, Darwin’s cousin, extended this 
idea of observing and measuring variation to humans. 
Thus, psychometrics was born—a subspecialty con- 
cerned with the measurement of psychological and 
social phenomena (DeVellis, 2012). Student affairs 
professionals venture into psychometrics when they 
design and administer surveys and scales. 

What exactly is a scale? A scale is a measurement 
instrument composed of multiple questions or items. 
‘The responses to these items are combined into a total 
score (usually by simply adding them) that represents 
the level of the latent variable. In our example, we 
want to measure students’ sense of belonging in the 
residence hall. Because we cannot know the actual 
level (true score) of their sense of belonging (latent 
variable), we use multiple items to indirectly measure 
it. Our scale takes the form of a series of statements 


such as these: 


e Iam making friends with other students in 
my residence hall. 
e lam part ofacommunity in my residence hall. 


e I feel as though I belong in my residence hall. 
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e Other students in my residence hall like me. 
e Myresidence hall feels like a home to me. 


Some redundancy or overlap among items is a 
good thing. Note that none of these items attempts 
to measure student behaviors; for example, “How 
often do you talk with others in your residence hall?” 
Research has shown that college students rarely accu- 
rately report on their behaviors, especially frequent, 
mundane activities (Porter, 2011). Therefore, profes- 
sionals should not try to measure specific behaviors; 
instead, they should focus on attitudes, perceptions, 
and levels of satisfaction. Each of the items in our 
example is an attempt to indirectly measure sense 
of belonging. But how do students respond to these 
items? This brings us to the issue of response format. 


3 Don’t Create Your Own Survey Format 


Once we have a set of questions or items, we must 
choose a format for measurement. You might be 
tempted to create your own format for responses, but 
it is not necessary to reinvent the wheel—scholars 
have developed, tested, and published various formats 
that capture information efficiently. DeVellis (2012) 
offered an excellent summary of response formats, 
along with their strengths and weaknesses. 


Some scales include a neutral choice . . . but 


respondents tend to gravitate foward this choice, 
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so many researchers remove this option . . . 


The most popular response format is the Likert 
scale, which uses declarative statements followed 
by response options that indicate varying degrees of 
agreement or disagreement with the statements (e.g., 
from strongly agree to strongly disagree). Some scales 
include a neutral choice (i.e., neither agree nor dis- 
agree), but respondents tend to gravitate toward this 
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choice, so many researchers remove this option and 
force the respondents to choose between agreement 
and disagreement. Unless specific research needs 
dictate otherwise, a six-point Likert scale with the fol- 
lowing options works best: (1) strongly disagree, (2) 
moderately disagree, (3) slightly disagree, (4) slightly 
agree, (5) moderately agree, (6) strongly agree. A 
six-point scale provides enough options to capture 
various levels of the latent variable, forces respondents 
to choose between agreement and disagreement on 
each item, and does not have so many options that it 
confuses the respondent. 

We now have items to measure sense of belong- 
ing and response choices for our items. But how do 
we know that these items actually capture sense of 
belonging? In other words, how do we ensure that our 
items are measuring what we intend to measure? 


4 Don’t Be Afraid of Validity 


Validity pertains to whether we are measuring the 
correct variable rather than accidentally measuring 
something else and whether we are reaching the 
appropriate conclusions based on our findings. Our 
goal is to measure sense of belonging in a residence 
hall. If our items capture aspects of a sense of belong- 
ing, they are valid; if they do not, they lack validity. 

There are lots of ways to think about validity. We 
will focus on four of them, each beginning with the 
letter “c”: (1) content validity, (2) criterion-related 
validity, (3) construct validity, and (4) conclusion 
validity (American Educational Research Association, 
American Psychological Association, and National 
Council on Measurement in Education, 1999). 

Content validity refers to the extent to which our 
scale measures what it is supposed to measure. How 
do we determine whether our items have content 
validity? Measurement will be more accurate if the 
latent variable is well defined and if the items on the 
scale directly link to this definition. 

‘The first step in the process is to look at research 
literature on the topic, for help in defining our latent 
variable. For our example, we refer to an article in the 
Journal of Student Affairs Research and Practice that 
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examines living-learning communities and students’ 
sense of community and belonging. Spanierman 
et al. (2013) cited previous literature that defines 
sense of community as “a feeling that members have 
of belonging, a feeling that members matter to one 
another and to the group, and a shared faith that 
members’ needs will be met” (McMillan & Chavis, 
1986, p. 9). For our survey, we define sense of belong- 
ing as the extent to which a student connects with 
other students in the residence hall and feels part of 
the residence hall community. 

Now that we have a definition, the second step is 
to use the literature as a guide in creating items for 
our scale. Essentially, we take what we learn about 
sense of belonging from the literature and turn it 
into statements with which students can agree or 
disagree. This is how we developed the statements in 
the bulleted list above. 

The third step is to ask experts to review our items. 
How we define “expert” depends on the magnitude of 
our research quest. If we are creating a scale for distri- 
bution at the national level, it is worth asking national 
experts on sense of belonging among college students 
to review our items and provide feedback. But if we 
are going to use this scale for one residence hall, we can 
simply ask other student affairs professionals at our 
institution who have graduate degrees in a field related 
to higher education and who are familiar with sense of 
belonging to review our items. The point is not to create 
more work than necessary; we just want to ensure some 
level of external review and outside perspective. 

Another way to demonstrate validity is to compare 
our survey with one that is already considered valid. 
This is known as criterion-related validity. To do this, 
we must find another scale that captures a latent vari- 
able similar to ours. Spanierman et al. (2013) used the 
sense of belonging scale created by Bollen and Hoyle 
(1990). This scale measures sense of belonging gener- 
ally and does not specifically refer to a residence hall, 
so we will still have to create most of our own items. 
However, we could include one or two of Bollen and 
Hoyle’s items in our scale and compare the responses. 
If the responses to our items and to the external items 
are similar, we have evidence of criterion-related 
validity. However, if students respond in drastically 
different ways to our items and to the Bollen-Hoyle 
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Figure 2. 8 Questions to Ask Yourself in Developing a Survey 


What do | want to measure? Determine and 
define the latent variable. 


What will I ask in order to measure it? Use 
previous research on the topic to generate items for 
your scale. 


How will | measure it? A six-point Likert scale 
is a common, well-tested method. 


How will | check for validity? Have people 
who are knowledgeable in the content area review 
your items. Ask them about item clarity, item 
conciseness, and other possible items to include. 
Ask yourself what the resulting scores will mean and 
what conclusions you can and cannot draw from 
the scores. 


How will I test the survey? Organize a focus 
group to pilot your instrument and provide you with 
feedback on its clarity. 


How will | administer the survey? Plan 
ahead for how participants will receive the survey 
and how you can encourage them to respond. 


How will I check reliability? Reliability is easy 
to check with statistical software. If you are not 
familiar with such software, find someone on your 
campus who will not mind taking a few minutes to 
run it for you. 


What will | do with my results? Once you 
have determined that your scale is reliable, you can 
sum the responses of the items to get a total score 
that measures your latent variable. What will you 
do with this information? How will you determine 
that a specific score is good or bad? How will 

the information help you make evidenced-based 
decisions? To avoid misuse of your data, define in 
advance—for yourself and others—what your results 
are not saying. You can use statistical procedures 

to compare the scores of subgroups or to determine 
what other variables might help predict the results of 
your latent variable. 


FIVE THINGS 0 / 


08 


items, we should probably question the validity of our 
own items and try again. If we cannot find another 
valid scale that measures the same latent variable we 
are trying to measure, we might be able to use some 
form of external data. For example, if our latent vari- 
able directly related to academic success, we might be 
able to use GPA. If we can find an appropriate exter- 
nal criterion, this kind of validity is helpful to ensure 
that we are measuring what we intend to measure. 


Conclusion validity is the extent to which 


the conclusions we draw from the 
results of the scale are logical. 


A third form of validity is construct validity. 
Construct validity analyzes the extent to which a 
scale behaves the way it should. In other words, does 
it sort out results the way we would expect? We evalu- 
ate construct validity through statistical analyses such 
as exploratory factor analysis, confirmatory factor 
analysis, and principal components analysis. In these 
analyses, statistical software programs identify the 
responses that “hang together.” In our survey, we hope 
students will generally answer positively or negatively 
on all the items (item responses hanging together). 
This would show that the scale is truly measuring 
sense of belonging. But if students respond positively 
on the first three items and negatively on the last two 
items (item responses not hanging together), we must 
suspect that we are measuring two different latent 
variables. Statistical analyses require a large sample 
size (300 as a general rule of thumb); thus, determin- 
ing construct validity is an advanced technique and 
not always practical. 

The fourth type of validity is conclusion validity. 
Conclusion validity is the extent to which the conclu- 
sions we draw from the results of the scale are logical. 
In our example, if scores on our sense of belonging 
scale are generally high, we might state that students in 
our residence hall are happy. But we would be wrong: 
Happiness was not our latent variable. Claims about 
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happiness with our students lack conclusion validity. 
On the other hand, if our scores are generally low, we 
might conclude that students desire a greater sense 
of belonging in their residence hall. But we did not 
measure what students want, did we? On the basis of 
previous literature, we can say that a sense of belong- 
ing is an important part of the college experience, but 
we cannot make claims about what our students want 
or do not want, because we did not measure that vari- 
able. We measured students’ beliefs about their current 
sense of belonging, and our conclusions need to remain 
there. Our data tell us to what extent students feel a 
sense of belonging in their residence hall. To check 
our conclusion validity, we should ask whether our 
interpretations of the data are logical and appropriate. 

Validity pertains to the fundamental issues 
of whether we measure the variable we intend to 
measure and whether our conclusions make sense. 
All four types of validity—content, criterion-related, 
construct, and conclusion—are important. However, 
trying to establish criterion-related or construct 
validity can overwhelm student affairs professionals, 
especially those who have little background in survey 
development or statistics. Student affairs profession- 
als should start with content and conclusion validity. 
Those two types of validity are more conceptual than 
statistical in nature, and are the most important ones. 

Once we are confident that our scale measures 
the correct latent variable, we can think about accu- 
racy: How well does it measure the variable? We call 


this reliability. 


5 Don’t Ignore Reliability 


A scale with reliability is one that performs in consis- 
tent, predictable, and fairly accurate ways. Earlier, we 
established that we can never know the true score for 
our latent variable. How can we know the precision 
of our scale if we don’t know the true score? In other 
words, how can we know whether our scale accurately 
measures sense of belonging if we cannot peer into 
students’ souls? Reliability is a clever statistical analy- 
sis to get around this problem. First, accuracy is placed 


on a measurement scale from 0 (totally inaccurate) to 
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1 (perfectly accurate). Although we do not know the 
true score, we can assign the true score a measurement 
of 1, because 1 is perfectly accurate. Therefore, we now 
know that reliability varies from 0 to 1 and that 1 rep- 
resents a perfectly reliable scale. 

We may not know how each of our items compares 
with the true score, but we can calculate how each 
item relates with the other items. This is internal con- 
sistency reliability, commonly measured by a method 
called Cronbach’s coefficient alpha. Cronbach’s alpha 
is easy to calculate using statistical software that com- 
pares the relationship of the responses to each item 
with the responses to all the other items on the scale. 
After calculating all the relationships, it provides an 
overall reliability score between 0 and 1. Because we 
know that the true score is 1, the difference between 
our Cronbach's alpha and 1 must be due to error (lack 
of accuracy in the scale). If the Cronbach’s alpha of 
our five-item scale is 0.76, the error (the distance from 
perfect accuracy) is 0.24 (1 - 0.76 = 0.24). 

What constitutes acceptable reliability? After 


CONCLUSION x 


~~ 


If we didn’t know any better, we might have asked 
the students in our residence hall one vague ques- 
tion (“What is your sense of belonging?”) without 
any thought of measurement, format, validity, or 
reliability. However, using fundamental principles of 
psychological measurement, we now have a five-item 
scale that measures sense of belonging in a residence 
hall in a valid and reliable manner. We can calculate 
each student’s sense of belonging score by simply 
summing the responses to the five items. If our reliabil- 
ity analysis told us that one of our items was hurting 
overall reliability, we would exclude that item and sum 
the other four. Once we have the total score for sense 
of belonging, we can use it in various statistical pro- 
cedures to determine differences in sense of belonging 
between subgroups, variables that might relate to sense 
of belonging, or what might predict sense of belonging 
in students. We can administer the same scale at the 
end of the spring semester to identify any changes in 
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decades of trial and error with scales, some rules of 
thumb have emerged. In general, a scale with a Cron- 
bach’s alpha less than 0.60 is unreliable. A scale with an 
alpha between 0.60 and 0.65 is undesirable. An alpha 
between 0.65 and 0.70 is minimally acceptable. An 
alpha greater than 0.70 is good, greater than 0.80 is very 
good, and greater than 0.90 is excellent (DeVellis, 2012). 

We can also check how each individual item helps 
or hurts the overall reliability of the scale. Sometimes 
one item is particularly bad. We can remove that item 
and recalculate the overall reliability with the remain- 
ing items. Thus, we don’t need to know ahead of time 
which of our items are strong and which are weak. 
Instead, we do the best we can to create good items 
from the literature, gather responses to our scale, and 
then conduct reliability analyses that tell us whether 
we need to exclude any items from the final version. 
Determining the reliability of our scale is especially 
important when it comes to efforts to improve our 
assessment techniques, because assessments are only as 
good as the instruments that underlie them. 


NE : 


sense of belonging. The bottom line is that we’ve taken 
an important step to improve practice in support of 
students and the work we do to assist in their personal 
development and overall learning. 

The purpose of this brief is to offer suggestions 
to help you avoid common mistakes and develop 
high-quality scales. Understanding how to measure 
latent variables, how to structure a scale, and how to 
ensure validity and reliability are vital aspects of survey 
development, and better surveys provide better infor- 
mation for better decisions. Graduate programs offer 
courses and even degrees in psychometrics—this brief 
is only an introduction. The additional resources pro- 
vided in Figure 3 are for those who want to learn more 
about survey development and implementation. 

The age of assessment and accountability in student 
affairs is here to stay. Rather than viewing assessment 
as a burden or an unpleasant obligation, student affairs 
professionals should embrace assessment for the sake 
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of improving the student experience. Assess- can demonstrate what student affairs is doing well, 
p g P g 


ment—thoughtfully planned, implemented, and highlight areas that need improvement, and help 
used—is the key to progress in student retention, identify the programs, environments, and services 
engagement, achievement, and learning. Assessment that support positive educational outcomes. 
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